Average Profile and Limiting Distribution for a Phrasesize In
نویسندگان
چکیده
Consider the parsing algorithm due to Lempel and Ziv that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice the following parameters are of interest: number of phrases, the size of a phrase, the number of phrases of given size, and so forth. In this paper, we focus on the size of a randomly selected phrase, and the average number of phrases of a given size (the so called average proole of phrase sizes). These parameters can be eeciently analyzed through a digital search tree representation. For a memoryless source with unequal probabilities of symbols generation (the so called asymmetric Bernoulli model), we prove that the size of a typical phrase is asymptotically normally distributed with mean and the variance explicitly computed. In terms of digital search trees, we prove the normal limiting distribution of the typical depth (i.e., the length of a path from the root to a randomly selected node). The latter nding is proved by a technique that belongs to the toolkit of the "analytical analysis of algorithms\, but which seems to be novel in the context of data compression. length, typical depth in a digital tree, limiting distributions, average proole, Mellin transform , analytical analysis of algorithms.
منابع مشابه
Compact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth
Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...
متن کاملPhase II logistic profile monitoring
In many industrial and non-industrial applications the quality of a process or product is characterized by a relationship between a response variable and one or more explanatory variables. This relationship is referred to as profile. In the past decade, profile monitoring has been extensively studied under the normal response variable, but it has paid a little attention to the profile with the ...
متن کاملMoving Average Processes with Infinite Variance
The sample autocorrelation function (acf) of a stationary process has played a central statistical role in traditional time series analysis, where the assumption is made that the marginal distribution has a second moment. Now, the classical methods based on acf are not applicable in heavy tailed modeling. Using the codifference function as dependence measure for such processes be shown it be as...
متن کاملمقایسه روش های زمین آمار به منظور تعیین بهترین روش درون یابی داده های زیست اقلیمی در مدل سازی پراکنش گونه های جانوری در مرکز ایران
Climatic change can impose physiological constraints on species and can therefore affect species distribution. Bioclimatic predictors, including annual trends, regimes, thresholds and bio-limiting factors are the most important independent variables in species distribution models. Water and temperature are the most limiting factors in arid ecosystem in central Iran. Therefore, mapping of climat...
متن کاملLimiting Properties of Empirical Bayes Estimators in a Two-Factor Experiment under Inverse Gaussian Model
The empirical Bayes estimators of treatment effects in a factorial experiment were derived and their asymptotic properties were explored. It was shown that they were asymptotically optimal and the estimator of the scale parameter had a limiting gamma distribution while the estimators of the factor effects had a limiting multivariate normal distribution. A Bootstrap analysis was performed to ill...
متن کامل